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1 PATENT APPLICATION 

2 

3 Inventor Citizenship Residence City and State 

4 Andrew CONWAY Australia MenloPark, CA 
5 

6 The assignee is Silicon Genetics, having an office in Redwood City, California. 

7 

8 TITLE OF THE INVENTION 

9 

10 Detecting Recessive Diseases in Inbred Populations 

11 

12 BACKGROUND OF THE INVENTION 

13 

14 1 . Field of the Invention 
15 

16 The invention relates to detecting recessive diseases in inbred populations, such as 

1 7 for example moderately inbred populations such as the Amish population. 
18 

19 2. Background of the Invention 
20 

21 Many rare recessive diseases occur more frequently in certain inbred populations. 

22 One example of such a population is the Amish. Because the gene pool in an inbred population 

23 is more limited, expression of recessive genetic diseases can occur more frequently than in other 
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1 populations. In particular, the chance can be higher for a child to inherit a matched pair of reces- 

2 sive alleles associated with a disease from his or her parents. 

3 

4 A brute-force approach could be used to try to correlate particular alleles with ge- 

5 netic diseases in the population. For example, it would be technically possible to sequence the 

6 entire genome of every member of one of these populations using conventional techniques. 

7 Gene sequences that coincide with occurrences of certain diseases could then be identified. 

8 However, extensive sequencing of an entire population, even a small one, would simply cost too 

9 much. Very few businesses and even governments would be able to afford the multi-billion dol- 
10 lar or even higher price for such an undertaking. 

11 

12 A more affordable technique would be to identify regions of the genome that are 

13 associated with the genetically-linked disease. Research can then focus on this region in a more 

1 4 cost effective manner. 
15 

16 SUMMARY OF THE INVENTION 

17 

18 Accordingly, what is needed is a technique that tends to identify a general region 

19 of a human genome that contains genetic component(s) that contribute to or cause a genetically- 

20 linked recessive disease. The invention disclosed herein attempts to produce such results in the 

21 context of diseases that occur relatively more frequently in a relatively inbred population. 
22 
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1 The invention addresses this need through techniques of using statistical analysis 

2 of genetic data to determine likely regions in the genome based upon markers there for a reces- 

3 sive genetic disease or trait. One embodiment of these techniques includes the steps of obtaining 

4 actual genotype data for one or more affected people with the genetic disease or trait in a popula- 

5 tion and/or actual genotype data for their parents, obtaining estimated genotype data for the 

6 population, and analyzing the actual and estimated genotype data to find a region in the genome 

7 of the affected people that includes markers exhibiting particular homozygous pairs of alleles 

8 more frequently than would occur randomly. 
9 

10 The techniques of the invention are particularly applicable to a population that is 

1 1 relatively inbred and that has a higher occurrence of the genetic disease or trait than a more gen- 

12 eral population. In such a population, the particular homozygous pairs of alleles that occur more 

1 3 frequently tend to be autozygous alleles descended from a founder of the genetic disease or trait. 
14 

15 In one embodiment, analyzing the genotype data further includes the steps of de- 

16 termining scores for each marker in the genotype data relative to each person for which actual 

17 genotype data was determined, merging the scores to arrive at a merged score for each marker, 

1 8 and determining a region of markers that has a high run of merged scores. 
19 

20 Preferably, a score for a marker represents a probability that a genotype measured 

21 for a person would actually be measured, given some assumption about the autozygosity at each 

22 marker's location. This approach results in a marker receiving a higher score from one form of 
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1 homozygosity versus another form of homozygosity. The form that receives the higher score 

2 tends to be more likely to be associated with the genetic disease or trait. 

3 

4 After the scores are determined, they can be placed in an array ordered by a 

5 chromosomal order of markers associated with the scores. This facilitates analysis of the data, 

6 for example using a computer. 

7 

8 In one embodiment, the region of markers that has the high run of merged scores 

9 has the highest run of merged scores in the array. This region can be found by determining a 

10 consecutive portion of the array that has the highest sum. In this embodiment, runs of all possi- 

11 ble lengths are considered. For example, if the total array of merged scores has 100 scores, the 

12 highest-scoring run might be 10 scores long, 20 scores long, or any other number of scores long. 
13 

14 High-scoring runs besides the highest-scoring run also can be of interest. For ex- 

15 ample, the next-highest runs might be of interest. Also, different techniques for finding runs of 

16 high scores (but not necessarily the highest run) can be used. In one such embodiment, the re- 

17 gion of markers that has the high run of merged scores is found by computing all sums of a pre- 

18 determined fixed number of adjacent elements in the array and comparing the sums. For exam- 

19 pie, if the total array of merged scores has 100 scores, the sums of all 10 score runs could be 

20 computed, resulting in 91 sums that could then be compared. Other techniques can be used. 
21 

22 Once a region with a high run of merged scores is found, traditional actual se- 

23 quencing or other analysis can be performed on the DNA of the people in the population, pref- 
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1 erably including people with the genetic disease or trait at issue, in or near the region that has the 

2 high run of merged scores. This sequencing hopefully will help find alleles or genetic patterns 

3 that cause the disease or trait. Because only this limited region is sequenced, this sequencing is 

4 far more affordable and feasible than sequencing the entire genome of every member of the sub- 

5 ject population. 
6 



7 The invention also encompasses apparatuses, hardware, and software adapted to 

8 perform the steps of the foregoing techniques, as well as other embodiments of the invention. 
9 

10 After reading this application, those skilled in the art would recognize that the 

1 1 techniques described herein provide an enabling technology, with the effect that heretofore ad- 

12 vantageous features can be provided that heretofore were substantially infeasible. 
13 

14 BRIEF DESCRIPTION OF THE FIGURES 

15 

16 Figure 1 illustrates inheritance of a genetic disease in a relatively inbred popula- 

17 tion. 
18 

19 Figure 2 is an illustration of inheritance of alleles from parents to a child. 

20 

21 Figure 3 is a flowchart showing steps for statistical analysis of genetic data ac- 

22 cording to one aspect of the invention. 

23 
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1 Figure 4 is a table showing calculations that can be used in the statistical analysis 

2 of genetic data. 

3 

4 Figure 5 is a table showing results of calculations of scores for markers. 
5 

6 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

7 

8 Preferred embodiments of the invention are described herein, including preferred 

9 device coupling, device functionality, and process steps. After reading this application, those 

10 skilled in the art would realize that embodiments of the invention might be implemented using a 

1 1 variety of other techniques not specifically described herein, without undue experimentation or 

12 further invention, and that such other techniques would be within the scope and spirit of the in- 

13 vention. 



14 

15 Definitions 
16 



17 The general meaning of each of these terms is intended to be illustrative and in no 

18 way limiting. 
19 

20 • The phrase "DNA" refers to a nucleic acid found in the nucleus of an organism's cells. 

21 DNA encodes information used by the organism to generate proteins, which in turn de- 

22 termine the physical characteristics of that organism. DNA is shaped from two strands 

23 connected together in a shape of a double helix. 
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1 • The phrase "base pairs" refers to chemicals (i.e., nucleotides) that connect together the 

2 two strands that form a DNA double helix. The four possible base pair chemicals in 

3 DNA are adenine, thymine, guanine and cytosine. Adenine on one strand always bonds 

4 to thymine on the other strand in the double helix; guanine always bonds to cytosine. 

5 These chemicals are often abbreviated by their first letter (e.g., A, T, G and C). 
6 

7 • The phrase "genome" refers to the entire DNA sequence of an organism such as a person. 

8 An organism's genome is often represented by a listing of abbreviations for the bases in 

9 the sequence, for example ATTACGGCACTG. . . . 
10 

1 1 • The phrase "chromosome" refers to a portion of a human genome on which genetic se- 

12 quences are linearly laid out; genetic sequences can be "near" each other on a chromo- 

13 some if there are relatively few base pairs between them. Organisms include two copies 

14 of each chromosome, which are called homologues of each other. Each homologue of a 

15 chromosome includes the same markers, but can include different alleles for those mark- 

16 ers. 
17 

18 • The phrase "marker sequence" or "marker" refers to a genetic sequence (i.e., DNA found 

19 on a chromosome) that has more than one variant in the general population. Because an 

20 organism generally has to copies of each chromosome, the organism will have two copies 

21 of each marker, which may be the same or different from each other. 
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1 • The phrase "allele" refers to any variant form of a marker. Alleles are often abbreviated 

2 with letters such as A, B, C, etc. The pair of alleles that a person has for the two copies 

3 of a particular marker is often abbreviated as AA, AB, BA, BB, AC, etc. 
4 

5 • The phrase "genotype" refers to the particular genetic makeup at specified locations (e.g., 

6 markers) in the DNA of an organism. 
7 

8 • The phrase "genotyping" refers to the process of determining a genotype for an organism. 
9 

10 • The phrase "recessive" refers to a disease or trait that is only active if the same allele is 

1 1 present in both copies of the genetic variation that causes the disease or trait. The phrase 

12 "dominant" refers to a disease or trait that is active if even only one allele is present in 

13 both copies of the genetic variation that causes the disease or trait. For example, if A is 

14 an allele for a recessive disease or trait and B is an allele for a dominant disease or trait, a 

15 person with alleles AA generally will express the recessive disease or trait, while a per- 

1 6 son with alleles AB, BA, or BB generally will express the dominant disease or trait. 
17 

1 8 • The phrase "homozygous" indicates two genetic sequences that are the same from both a 

19 person's mother and father. If homozygous genetic sequences are for an allele for a re- 

20 cessive genetic disease or trait, that disease or trait generally will be expressed in the per- 

21 son. 
22 
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1 • The phrase "heterozygous" indicates two genetic sequences that are different from the 

2 mother and the father. 

3 

4 • The phrase "founder" refers to an individual, or a small set of individuals, who brought a 

5 disease sequence into a population. 
6 

7 • The phrase "autozygous" indicates homozygous where the genetic sequences that are the 

8 same come from a common source such as a founder. 
9 

10 • The phrase "disease sequence" refers to a genetic sequence, for example an allele, that 

1 1 causes or is associated with a particular disease. 
12 

13 The scope and spirit of the invention is not limited to any of these definitions, or 



14 to specific examples mentioned therein, but is intended to include the most general concepts em- 

1 5 bodied by these and other terms. 
16 

17 Overview 
18 

19 Figure 1 illustrates inheritance of a genetic disease in a relatively inbred popula- 

20 tion. 
21 
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1 In Figure 1, population 1 is relatively inbred compared to a more general popula- 

2 tion. For example, the Amish population is relatively inbred compared to the general population 

3 of the United States or to the general population in regions where the Amish live. 
4 

5 At some time in the past, founder 2 introduced a genetic disease into the popula- 

6 tion. The disease is assumed to be recessive. Thus, in order for the disease to be expressed, a 

7 person must have two matching alleles for the disease at the corresponding location in the per- 

8 son's DNA. 



9 

10 In order to have two matching alleles for the disease, one must have come from 

1 1 the person's mother and one from the person's father. This situation is known as "homozygos- 

12 ity." Furthermore, because these alleles were introduced by a single founder, the alleles are said 

13 to be "autozygous." 
14 



15 In Figure 1, founder 2 had at least two offspring that each carried one allele for 

16 the genetic disease introduced by the founder. These alleles were passed by subsequent off- 

17 spring until they met at affected person 3 in the population through parents 4 and 5. 
18 

19 Generally, the paths taken by the alleles from a founder to an affected person do 

20 not cross. Otherwise, the person at whom they crossed would be an affected person. However, 

21 in some instances, the paths might cross. For example, if the disease is not terminal, the person 

22 might have passed one of the alleles on to a descendant. Likewise, if some other genetic or envi- 
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1 ronmental factor is necessary for expression of the disease, the paths might have crossed without 

2 the disease being expressed. 

3 

4 Figure 2 is an illustration of inheritance of alleles from parents to a child. The 

5 particular combinations of alleles shown and discussed with respect to Figure 2 are illustrative 

6 only. The invention is not limited to these particular alleles, markers, and disease alleles. 
7 

8 In Figure 2, child 3 suffers from the recessive genetic disease under study. The 

9 child inherited one set of alleles 8 from father 4 and one set of alleles 9 from mother 5, as illus- 
1 0 trated by the curved arrows. 

11 

12 The disease allele A is a recessive disease causing allele. Because two of these 

13 recessive alleles are present, the disease will be expressed in the child. 
14 

15 Marker alleles 10 and 11 are nearby alleles that are useful as markers. Father 4 

16 and mother 5 in Figure 2 each have one copy of these marker alleles. 
17 

18 In some cases, these alleles might be single nucleotide polymorphisms (SNPs). 

19 Other types of marker alleles can be used. For example, in Figure 2, three different types of al- 

20 leles are present, so these markers are not SNPs. 
21 

22 Both the disease alleles and the marker alleles are homozygous, meaning that they 

23 are the same from both the child's mother and father. The disease alleles and the nearby marker 
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1 alleles ultimately originated with the founder (not shown). Thus, these alleles are also autozy- 

2 gous. 
3 

4 Alleles 8 and 9 are slightly different from each other because sets of alleles on a 

5 chromosome do not necessarily pass as a complete group. Some cross-over of alleles between 

6 homologies typically occurs from one generation to the next, resulting in mixing of alleles. The 

7 difference between alleles 8 and 9 (in the second marker from the top) could be the result of such 

8 cross-over at some point in the line of descent from the founder to the parents. Other causes 

9 (e.g., mutation) could also account for such differences, which may or may not be present to 
10 varying degrees. 

11 

12 One result of allele cross-over is that marker alleles from the founder might ap- 

13 pear when disease alleles are not present, and marker alleles might be absent when disease alleles 

14 are present. However, nearby alleles are more likely to stay together from one generation to the 

1 5 next than distant alleles. Thus, the more common case is that the same nearby marker alleles ap- 

1 6 pear in an affected person as appeared in the founder. 
17 

1 8 Thus, Figure 2 illustrates that a child with a pair of disease alleles is likely to have 

19 copies of nearby markers possessed by the founder. Furthermore, the parents are each likely to 

20 have at least one copy of the nearby markers. 
21 

22 The presence of these markers can be used to help locate a chromosomal region 

23 close to alleles causing or otherwise associated with the genetic disease. The overall approach of 
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1 the invention is to try to find chromosomal regions for people with the disease under study that 

2 show a pattern more consistent than would occur by chance. Part of this pattern is the presence 

3 of homozygous alleles that occur more frequently than chance allows. Another part of this pat- 

4 tern is the presence of one type of homozygous alleles more frequently than other types. 
5 

6 In more detail, as discussed above, markers near to disease alleles tend to come 

7 from the same founder and tend to pass along with the disease alleles. As a result, the same pat- 

8 tern of marker alleles as found in the founder should tend to be more prevalent in affected peo- 

9 pie. Thus, in the example shown in Figure 2, affect persons should have alleles BB for marker 

10 10 and alleles AA for markers 11 much more frequently than other combinations of markers. 

1 1 Accordingly, particular combinations of homozygous markers that occur more frequently than 

12 other combinations of markers are of particular interest. 
13 

14 One embodiment of the invention that takes advantage of the foregoing observa- 

1 5 tions is basically a two-step process. 
16 

17 First, scores are generated for each marker in the genotypes of members of a 

18 population that exhibit a recessive genetic disease. Each score represents a probability that a 

19 genotype measured for a person would actually be measured, given some assumption about the 

20 autozygosity at each marker's location. 
21 

22 Second, the scores are merged for all people in the population affected by the dis- 

23 ease under consideration. This results in one score for each marker. Then, the scores are 
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1 searched for a high or highest valued run. This run corresponds to markers that are likely to have 

2 descended along with the disease allele from the founder and therefore are likely to be close to 

3 the disease alleles. 
4 

5 Once a region likely to contain the disease allele is identified, actual sequencing 

6 of the DNA in or near this region can be performed using well known traditional techniques (or 

7 other techniques as they become developed). This sequencing can be performed on people with 

8 the genetic disease at issue, as well as on other people in the population. Because only a limited 

9 region of the DNA is being sequenced, this process is much more feasible than a brute-force se- 

10 quencing of the entire genome (i.e., all the DNA) for every member of the population with the 

1 1 disease. Other known or developed techniques for studying the identified region also can be util- 

12 ized. 
13 

14 Steps for implementing the foregoing technique are discussed in more detail be- 

1 5 low with reference to Figures 3 and 4. 
16 

1 7 Statistical Analysis 
18 

19 Figure 3 is a flowchart showing steps for statistical analysis of genetic data to de- 

20 termine likely markers for a recessive genetic disease or trait 
21 

22 As indicated in note 30, the steps in Figure 3 can be implemented on a computer, 

23 network, web site, etc., using either general purpose or special purpose hardware and software. 
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1 In these embodiments, arrays are particularly useful for handling genotype data and scores. Of 

2 course, the invention is not limited to use of arrays or to computer-implemented embodiments. 

3 

4 In step 31, actual genotype data is determined for one or more affected persons 

5 with the genetic disease under consideration. This genotype data is not a full sequencing of the 

6 person's DNA. Rather, the genotype data is an identification of particular alleles at a selected set 

7 of markers in the person's DNA. For example, a set of SNP markers could be determined for the 

8 affected person(s). Such genotyping is far less expensive than full DNA sequencing. 
9 

1 0 Actual genotype data also can be determined for the parents of affected persons. 

11 

12 In step 32, estimates are obtained of genotype frequency data for the entire inbred 

13 population to which the affected persons and their parents belong. When determining these es- 

14 timates, it can be assumed that the alleles a child gets for any marker from his or her parents are 

15 independent. 
16 

17 In one embodiment, the estimates are found by actually genotyping a subset of the 

1 8 population. An error rate e for the estimates can be assumed, with the presence of the error indi- 

19 eating that a measured value in the genotyping is a result of a random selection from the popula- 

20 tion. Standard statistical techniques can be used to determine the error rate e from the size of the 

21 subset and the size of the overall population under consideration. Other techniques can be used 

22 to find the estimates without departing from the invention. 
23 
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1 Scores are determined in step 33 for the markers selected for the genotyping. A 

2 score is determined in turn for each marker relative to each affected member or parent for which 

3 actual genotype data was determined in step 3 1 . 
4 

5 Figure 4 shows a table with probability calculations that can be used to determine 

6 the scores according to one embodiment of the invention. Several variables are used in these 

7 calculations, as follows: 

8 n = a number of alleles possible for the marker under consideration, designated as A, B, 

9 C, etc. - for markers that are SNPs, n is usually two; 

10 p x = the estimated frequency of allele X in the population, as determined in step 32, with 

1 1 X being one of A, B, C, etc. (e.g., pa = the estimated frequency of allele A at the 

12 marker, p B = the estimated frequency of allele B at the marker, etc.); 

13 p x M = the probability that an affected person got allele X at the marker under considera- 

14 tion from his or her mother - if the mother's genotype at the marker is known, this 

15 can be determined using standard Mendelian genetics and will be 0, 0.5, or 1; other- 

16 wise px is used; 

17 p x F = the probability that an affected person got allele X at the marker under considera- 

18 tion from his or her father - if the father's genotype at the marker is known, this can 

19 be determined using standard Mendelian genetics and will be 0, 0.5, or 1; otherwise 

20 px is used. 
21 

22 In order to find a score for a marker relative to an affected person or parent of an 

23 affected person, the row of the table in Figure 4 is selected that corresponds to the observed 
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1 genotype data for that person or parent. The calculations in that row are performed to determine 

2 probabilities of observing that marker given various types of autozygosity with the founder and 

3 also the probability of observing that marker in the absence of autozygosity. 
4 

5 For each marker, this process is repeated relative to each affected person or parent 

6 of an affected person for whom actual genotype data is available. The result is a collection of 

7 scores for each marker representing probabilities of different types of autozygosity relative to 

8 each affected person or parent, as illustrated in Figure 5. 
9 



10 Markers will receive higher scores for some forms of homozygosity as compared 

1 1 to other forms. The forms that receive the higher scores tend to be more likely to be associated 

1 2 with the genetic disease or trait. 
13 

14 The tables in Figures 4 and Figure 5 can be expanded using basic rules of symme- 

15 try to accommodate other possible combinations of alleles. These tables can also be expanded to 

1 6 more complex pedigree information (i.e., grandparents). 
17 

1 8 Next, in step 34, the scores are merged. 

19 

20 First, scores for each type of autozygosity for each marker are multiplied together. 

21 For example, in Figure 5, scores in group 41 are multiplied together, scores in group 42 are mul- 

22 tiplied together, and scores in group 43 are multiplied together. This is repeated for all markers. 

23 
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1 Second, the products for each type of autozygosity are summed weighted by the 

2 probability of that allele for that marker in the population. For example, the products from mul- 

3 tiplying groups 41, 42 and 43 are summed. This is repeated for all markers. The result is a score 

4 representing the likelihood of observing the actual measured value for the marker given that the 

5 marker is autozygous (i.e., homozygous and inherited from the founder). 
6 



7 Third, scores for the "not autozygous" case for each marker are multiplied to- 

8 gether. For example, scores in group 44 are multiplied together. This is repeated for all markers. 

9 The result is a score representing the likelihood of observing the actual measured value for the 

10 marker given that the marker is not autozygous and comes independently from the overall popu- 

1 1 lation distribution (i.e., is not from the founder). 
12 

13 More formally, if O is a set of genotype measurements believed to come from a 



14 single founder (i.e., genotypes of persons affected by the disease or trait under study), o is one of 

15 the genotypes in O, Pr(o | autozygous i) and Pr(o | not autozygous) come from the table in Figure 

16 5 (which in turn comes from the table in Figure 4), and i is an index of different possible alleles 

17 at each marker, then 

18 Pr(0 | autozygous i) = J^J Pr(o | autozygous i) , 

oeO 

19 Pr(0 | autozygous) = ^ pi Pr(0 | autozygous i) , and 

20 Pr(0 | not autozygous) = Y[ Pr (° I not autozygous) . 

0€0 

21 



EL 768 962 920 US 



208.1005.01 



1 Fourth, the ratio of Pr(0 | autozygous) to Pr(0 | not autozygous) is computed for 

2 each marker. Preferable, a log base 10 is taken of each ratio. More formally: 

3 Marker Score = logio [Pr(0 | autozygous) / Pr(0 | not autozygous)] 
4 

5 The resulting score is comparable to a LOD score obtained through different types 

6 of analysis such as genetic linkage or sib pair analysis. 
7 

8 The foregoing order of mathematical operations is chosen merely for the sake of 

9 convenience of explanation; other orders can be used without departing from the invention. 

10 These orders include, but are not limited to, maintaining running products and sums, performing 

1 1 simultaneous multiplication and summing operations, and the like. 
12 

13 The end result of step 34 is a score for each marker for which genotype data was 

14 collected. These scores can be arranged in an array or otherwise ordered in accordance with the 

1 5 order of the markers on chromosomes. 
16 

17 The scores themselves are intrinsically interesting because the computations up to 

18 this point are relatively conservative. Thus, high scores are very likely to be significant. 
19 

20 In step 35, the merged scores are examined to find a run of high scores. In the 

21 preferred embodiment, the contiguous run of scores with the highest sum is found. Known tech- 

22 niques exist for finding a consecutive region with the highest sum in an array of numbers. One 

23 such technique is briefly described below: 
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1 1 . Set a "running score" variable S to 0 

2 2. Set a "current region start" variable C to clear 

3 3 . Set a "best region" variable B to clear 

4 4. Set a "highest score" variable H to 0 

5 5. Loop over all scores in the array in chromosomal order 

6 a. Let MS be the Marker Score at the current place in the loop 

7 b. Add MS to S 

8 c. If S is zero or less, the marker is not interesting; set S to 0 and clear C 

9 d. If S is greater than zero, the marker may be interesting; if C is clear, 

10 set C to this marker 

11 e. If S is greater than H, this is the best region so far; set B to start at C 

1 2 and end at this marker; set S to H 
13 

14 The chromosomal region corresponding to the "best region" B is likely to include 

15 or at least to be near the disease-causing alleles. 
16 

17 High-scoring runs besides the highest-scoring run also can be of interest. For ex- 



18 ample, the next-highest runs determined using the foregoing technique might be of interest. A 

19 statistically significant jump or gap in scores between high-scoring runs and low-scoring runs 

20 could be used to select interesting regions. For example, if the highest scoring run has a score of 

21 20, the next highest non-overlapping run has a score of 18 or 19, and the next nearest highest 

22 non-overlapping run has a score of 6, then the regions corresponding to scores of 18 or 19 and 20 

23 might be of interest. 
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1 In addition, other techniques for finding runs of high scores (but not necessarily 

2 the highest run) can be used. In one such embodiment, the region of markers that has the high 

3 run of merged scores is found by computing all sums of a predetermined fixed number of adja- 

4 cent elements in the array and comparing the sums. For example, if the total array of merged 

5 scores has 100 scores, the sums of all 10 score runs could be computed, resulting in 91 sums that 

6 could then be compared. Other techniques can be used. 
7 

8 Once a region with a high run of merged scores is found, actual sequencing of the 

9 DNA in or near this region can be performed in step 36 using well known traditional techniques 

10 (or other techniques as they become developed). This sequencing can be performed on people 

1 1 with the genetic disease at issue, as well as on other people in the population. Because only a 

12 limited region of the DNA is being sequenced, this process is much more feasible than a brute- 

13 force sequencing of the entire genome (i.e., all the DNA) for every member of the population 

14 with the disease. Other known or developed techniques for studying the identified region also 

15 can be utilized. 
16 

17 Genetic Traits Other Than Disease 
18 

19 The foregoing discussion was in the context of a recessive genetic disease. How- 

20 ever, the techniques of the invention are equally applicable to studies of recessive genetic traits. 

21 Application of these techniques to non-disease traits would not require further invention or un- 

22 due experimentation. 

23 
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1 Computer-Implemented Embodiments 
2 

3 Those skilled in the art would recognize, after perusal of this application, that em- 

4 bodiments of the invention may be implemented using one or more general purpose processors 

5 or special purpose processors adapted to particular process steps and data structures operating 

6 under program control, that such process steps and data structures can be embodied as informa- 

7 tion stored in or transmitted to and from memories (e.g., fixed memories such as DRAMs, 

8 SRAMs, hard disks, caches, etc., and removable memories such as floppy disks, CD-ROMs, data 

9 tapes, etc.) including instructions executable by such processors (e.g., object code that is directly 

10 executable, source code that is executable after compilation, code that is executable through in- 

1 1 terpretation, etc.), and that implementation of these process steps and data structures using such 

12 equipment would not require undue experimentation or further invention. For example, and 

13 without limitation, embodiments of the invention can be implemented on a desktop or laptop 

1 4 computer with standard input and output interfaces. 
15 

1 6 Alternative Embodiments 
17 

18 Although preferred embodiments are disclosed herein, many variations are possi- 

19 ble which remain within the concept, scope, and spirit of the invention. These variations would 

20 become clear to those skilled in the art after perusal of this application. 

21 

22 After reading this application, those skilled in the art will recognize that these al- 

23 ternative embodiments and variations are illustrative and are intended to be in no way limiting. 
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1 After reading this application, those skilled in the art would recognize that the 

2 techniques described herein provide an enabling technology, with the effect that heretofore ad- 

3 vantageous features can be provided that heretofore were substantially infeasible. 
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